Sequence Analysis A non-independent energy based multiple sequence alignment improves prediction of transcription factor binding sites
نویسندگان
چکیده
Motivation: Multiple Sequence Alignments (MSAs) are usually scored under the assumption that the sequences being aligned have evolved by common descent. Consequently, the differences between sequences reflect the impact of insertions, deletions and mutations. However, non-coding DNA binding sequences, such as transcription factor binding sites (TFBS), are frequently not related by common descent, and so the existing alignment scoring methods are not well suited for aligning such sequences. Results: We present a novel multiple MSA methodology that scores TFBS DNA sequences by including the interdependence of neighboring bases. We introduced two variants supported by different underlying hypotheses, one statistically and the other thermodynamically based. We assessed the alignments through their performance in TFBS prediction: both methods show considerable improvements when compared with standard MSA algorithms. Moreover, the thermodynamically based hypothesis (EDNA) outperforms the statistical one due to improved stability in the base stacking free energy of the alignment. EDNA can be downloaded from http://sourceforge.net/projects/msa-edna/.
منابع مشابه
A non-independent energy-based multiple sequence alignment improves prediction of transcription factor binding sites
MOTIVATION Multiple sequence alignments (MSAs) are usually scored under the assumption that the sequences being aligned have evolved by common descent. Consequently, the differences between sequences reflect the impact of insertions, deletions and mutations. However, non-coding DNA binding sequences, such as transcription factor binding sites (TFBSs), are frequently not related by common descen...
متن کاملSpecies Selection for Phylogeny-Based Motif Detection
Detecting conserved regions in multiple species alignment is crucial when modeling orthologous entities. However, in phylogenetic analysis of entities other than genes, for instance transcription factor binding sites (TFBS), this proves to be non-trivial due to the high functional turnover and incomplete orthology even within close species, such as Drosophila clade. Having more species does not...
متن کاملECR Browser: a tool for visualizing and accessing data from comparisons of multiple vertebrate genomes
With an increasing number of vertebrate genomes being sequenced in draft or finished form, unique opportunities for decoding the language of DNA sequence through comparative genome alignments have arisen. However, novel tools and strategies are required to accommodate this large volume of genomic information and to facilitate the transfer of predictions generated by comparative sequence alignme...
متن کاملAlignment and Prediction of cis-Regulatory Modules Based on a Probabilistic Model of Evolution
Cross-species comparison has emerged as a powerful paradigm for predicting cis-regulatory modules (CRMs) and understanding their evolution. The comparison requires reliable sequence alignment, which remains a challenging task for less conserved noncoding sequences. Furthermore, the existing models of DNA sequence evolution generally do not explicitly treat the special properties of CRM sequence...
متن کاملConBind: motif-aware cross-species alignment for the identification of functional transcription factor binding sites
Eukaryotic gene expression is regulated by transcription factors (TFs) binding to promoter as well as distal enhancers. TFs recognize short, but specific binding sites (TFBSs) that are located within the promoter and enhancer regions. Functionally relevant TFBSs are often highly conserved during evolution leaving a strong phylogenetic signal. While multiple sequence alignment (MSA) is a potent ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013